AITopics | historical gradient

Collaborating Authors

historical gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fast and Slow Gradient Approximation for Binary Neural Network Optimization

Chen, Xinquan, Gao, Junqi, Qi, Biqing, Li, Dong, Luo, Yiang, Li, Fangyuan, Li, Pengfei

arXiv.org Artificial IntelligenceDec-16-2024

Binary Neural Networks (BNNs) have garnered significant attention due to their immense potential for deployment on edge devices. However, the non-differentiability of the quantization function poses a challenge for the optimization of BNNs, as its derivative cannot be backpropagated. To address this issue, hypernetwork based methods, which utilize neural networks to learn the gradients of non-differentiable quantization functions, have emerged as a promising approach due to their adaptive learning capabilities to reduce estimation errors. However, existing hypernetwork based methods typically rely solely on current gradient information, neglecting the influence of historical gradients. This oversight can lead to accumulated gradient errors when calculating gradient momentum during optimization. To incorporate historical gradient information, we design a Historical Gradient Storage (HGS) module, which models the historical gradient sequence to generate the first-order momentum required for optimization. To further enhance gradient generation in hypernetworks, we propose a Fast and Slow Gradient Generation (FSG) method. Additionally, to produce more precise gradients, we introduce Layer Recognition Embeddings (LRE) into the hypernetwork, facilitating the generation of layer-specific fine gradients. Extensive comparative experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that our method achieves faster convergence and lower loss values, outperforming existing baselines.Code is available at http://github.com/two-tiger/FSG .

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2412.11777

Country:

North America > United States (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

Huang, Tao, Chen, Ziyang, Meng, Jiayang, Huang, Qingyu, Yang, Xu, Yi, Xun, Khalil, Ibrahim

arXiv.org Artificial IntelligenceJun-23-2024

In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce the effectiveness of unlearning. Addressing these limitations, we introduce Mini-Unlearning, a novel approach that capitalizes on a critical observation: unlearned parameters correlate with retrained parameters through contraction mapping. Our method, Mini-Unlearning, utilizes a minimal subset of historical gradients and leverages this contraction mapping to facilitate scalable, efficient unlearning. This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks. Our experiments demonstrate that Mini-Unlearning not only works under higher unlearning ratios but also outperforms existing techniques in both accuracy and security, offering a promising solution for applications requiring robust unlearning capabilities.

gradient, historical gradient, mini-unlearning, (14 more...)

arXiv.org Artificial Intelligence

2406.16986

Country: Asia > China (0.04)

Genre: Research Report > Promising Solution (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate

Aixiang, null, Chen, null, Zhang, Jinting, Zhang, Zanbo, Li, Zhihong

arXiv.org Machine LearningFeb-20-2022

The fluctuation effect of gradient expectation and variance caused by parameter update between consecutive iterations is neglected or confusing by current mainstream gradient optimization algorithms.Using this fluctuation effect, combined with the stratified sampling strategy, this paper designs a novel \underline{M}emory \underline{S}tochastic s\underline{T}ratified Gradient Descend(\underline{MST}GD) algorithm with an exponential convergence rate. Specifically, MSTGD uses two strategies for variance reduction: the first strategy is to perform variance reduction according to the proportion p of used historical gradient, which is estimated from the mean and variance of sample gradients before and after iteration, and the other strategy is stratified sampling by category. The statistic \ $\bar{G}_{mst}$\ designed under these two strategies can be adaptively unbiased, and its variance decays at a geometric rate. This enables MSTGD based on $\bar{G}_{mst}$ to obtain an exponential convergence rate of the form $\lambda^{2(k-k_0)}$($\lambda\in (0,1)$,k is the number of iteration steps,$\lambda$ is a variable related to proportion p).Unlike most other algorithms that claim to achieve an exponential convergence rate, the convergence rate is independent of parameters such as dataset size N, batch size n, etc., and can be achieved at a constant step size.Theoretical and experimental results show the effectiveness of MSTGD

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2202.10923

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback